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Abstract —One key requirement for storage clouds is to be 
able to retrieve data quickly. Recent system measurements have 
shown that the data retrieving delay in storage clouds is highly 
variable, which may result in a long latency tail. One crucial idea 
to improve the delay performance is to retrieve multiple data 
copies by using parallel downloading threads. However, how to 
optimally schedule these downloading threads to minimize the 
data retrieving delay remains to be an important open problem. 
In this paper, we develop low-complexity thread scheduling 
policies for several important classes of data downloading time 
distributions, and prove that these policies are either delay- 
optimal or within a constant gap from the optimum delay 
performance. These theoretical results hold for an arbitrary 
arrival process of read requests that may contain finite or infinite 
read requests, and for heterogeneous MDS storage codes that can 
support diverse storage redundancy and reliability requirements 
for different data files. Our numerical results show that the delay 
performance of the proposed policies is significantly better than 
that of First-Come-First-Served (FCFS) policies considered in 
prior work. 

I. Introduction 

Cloud storage is a prevalent solution for online data storage, 
as it provides the appealing benefits of easy access, low main¬ 
tenance, elasticity, and scalability. The global cloud storage 
market is expected to reach $56.57 billion by 2019, with a 
compound annual growth rate of 33.1% m. 

In cloud storage systems, multiple copies of data are gen¬ 
erated using simple replications m-m or erasure storage 
codes 0-®, and distributedly stored in disks, in-memory 
databases and caches. For an (n, k) erasure code ( n > k), data 
is divided into k equal-size chunks, which are then encoded 
into n chunks and stored in n distinct storage devices. If the 
code satisfies the typical maximum distance separable (MDS) 
property, any k out of the n chunks are sufficient to restore 
original data. When k = 1, the (n, k) erasure code reduces to 
the case of data replication (aka repetition codes). 

Current storage clouds jointly utilize multiple erasure codes 
to support diverse storage redundancy and reliability require¬ 
ments. For instance, in Facebook’s data warehouse cluster, 
frequently accessed data (or so called “hot data”) is stored 
with 3 replicas, while rarely accessed data (“cold data”) is 
stored by using a more compressed (14,10) Reed-Solomon 
code to save space 0. Open-source cloud storage softwares, 
such as HDFS-RAID 0 and OpenStack Swift ®, have been 
developed to support the coexistence of multiple erasure codes. 

This work has been supported in part by an IRP grant from HP. 


One key design principle of cloud storage systems is fast 
data retrieval. Amazon, Microsoft, and Google all report that 
a slight increase in user-perceived delay will result in a 
concrete revenue loss 0, ESI- However, in current storage 
clouds, data retrieving time is highly random and may have 
a long latency tail due to many reasons, including network 
congestion, load dynamics, cache misses, database blocking, 
disk I/O interference, update/maintenance activities, and un¬ 
predictable failures 0 , mi-tm One important approach to 
curb this randomness is downloading multiple data copies in 
parallel. For example, if a file is stored with an (ri. k ) erasure 
code, the system can schedule more than k downloading 
“threads”, each representing a TCP connection, to retrieve the 
file. The first k successfully downloaded chunks are sufficient 
to restore the file, and the excess downloading threads are 
terminated to release the networking resources. By this, the 
retrieval latency of the file is reduced. However, scheduling 
redundant threads will increase the system load, which may 
in turn increase the latency. Such a policy provides a tradeoff 
between faster retrieval of each file and the extra system 
load for downloading redundant chunks. Therefore, a critical 
question is “how to optimally manage the downloading threads 
to minimize average data retrieving delay?” Standard tools 
in scheduling and queueing theories, e.g., ED-GD and the 
references therein, cannot be directly applied to resolve this 
challenge because they do not allow scheduling redundant and 
parallel resources for service acceleration. 

In this paper, we rigorously analyze the fundamental delay 
limits of storage clouds. We develop low-complexity online 
thread scheduling policies for several important classes of data 
downloading time distributions, and prove that these policies 
are either delay-optimal or within a constant gap from the 
optimum delay performance]]] Our theoretical results hold for 
an arbitrary arrival process of read requests that may contain 
finite or infinite read requests, and for heterogeneous MDS 
storage codes that can support diverse code parameters ( rii , ki ) 
for different data files. The main contributions of our paper are 
listed as follows and summarized in Table Q] An interesting 
state evolution argument is developed in this work, which is 
essential for establishing the constant delay gaps; the interested 
reader is referred to the appendices for the detailed proofs. 

« When the downloading times of data chunks are i.i.d. 

'By constant delay gap, we mean that the delay gap is bounded by a 
constant value that is independent of the request arrival process and system 
traffic load. 
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Theorem 

Arrival 

process 

Parameters of 

MDS codes 

Service 

preemption 

Downloading time 
distribution 

Policy 

Delay gap from optimum 

u 

any 

^min ^ L 

allowed 

i.i.d. exponential 

SERPT-R 

delay-optimal 

0 

any 

any 

allowed 

i.i.d. exponential 

SERPTR 

_1 yv-L — 1 1 

0 

any 

^min ^ L 

not allowed 

i.i.d. exponential 

SEDPTR 

i/m 
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any 

any 

not allowed 

i.i.d. exponential 

SEDPTR 

m (£iLdL n t + x ) 


5 


any 

any 

not allowed 

i.i.d. New-Longer-than-Used 

SEDPT-NR 

0(ln L/n) 
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any 

any 

allowed 

i.i.d. New-Longer-than-Used 

SEDPT-WCR 

OQ.nL/p) 


7 


any 

ki = 1, d m i n > L 

not allowed 

i.i.d. New-Shorter-than-Used 

SEDPTR 

delay-optimal 


TABLE I: Summary of the delay performance of our proposed policies under different settings, where G? m ; n is the minimum 
distance among all MDS storage codes defined in 0, 1/p is the average chunk downloading time of each thread, and L is 
the number of downloading threads. The classes of “New-Longer-than-Used” and “New-Shorter-than-Used” distributions are 
defined in Section [V] Note that the delay gaps in this table are independent of the request arrival process and system traffic 
load. 


exponential with mean 1/p, we propose a Shortest 
Expected Remaining Processing Time policy with Re¬ 
dundant thread assignment (SERPT-R), and prove that 
SERPT-R is delay-optimal among all online policies, if 
(i) the storage redundancy is sufficiently high and (ii) 
preemption is allowed. If condition (i) is not satisfied, 
we show that under SERPT-R, the extra delay caused 
by low storage redundancy is no more than the average 
downloading time of (lnL + 1) chunks, i.e., (In L + l)/p, 
where L is the number of downloading threads. (This 
delay gap grows slowly with respect to L, and is inde¬ 
pendent of the request arrival process and system traffic 
load.) Further, if preemption is not allowed, we propose a 
Shortest Expected Differentiable Processing Time policy 
with Redundant thread assignment (SEDPT-R), which has 
a delay gap of no more than the average downloading 
time of one chunk, i.e., 1/p, compared to the delay- 
optimal policy. 

« When the downloading times of data chunks are i.i.d. 
New-Longer-than-Used (NLU) (defined in Section [Vl) . 
we design a Shortest Expected Differentiable Processing 
Time policy with Work-Conserving Redundant thread 
assignment (SEDPT-WCR) for the preemptive case and a 
Shortest Expected Differentiable Processing Time policy 
with No Redundant thread assignment (SEDPT-NR) for 
the non-preemptive case. We show that, comparing with 
the delay-optimal policy, the delay gaps of preemptive 
SEDPT-WCR and non-preemptive SEDPT-NR are both 
of the order 0(\nL/p). 

• When the downloading times of data chunks are i.i.d. 
New-Shorter-than-Used (NSU) (defined in Section[V}, we 
prove that SEDPT-R is delay-optimal among all online 
policies, under the conditions that data is stored with 
repetition codes, storage redundancy is sufficiently high, 
and preemption is not allowed. 

We note that the proposed SEDPT-type policies are different 
from the traditional Shortest Remaining Processing Time first 
(SRPT) policy, and have not been proposed in prior work. 

II. Related Work 

The idea of reducing delay via multiple parallel data 
transmissions has been explored empirically in various con¬ 
texts EH-E3- More recently, theoretical analysis has been 


conducted to study the delay performance of data retrieval 
in distributed storage systems. One line of studies EH-EO 
were centered on the data retrieval from a small number of 
storage nodes, where the delay performance is limited by the 
service capability of individual storage nodes. It was shown 
in 1261 that erasure storage codes can reduce the queueing 
delay compared to simple data replications. In l27l . EH, delay 
bounds were provided for First-Come-First-Served (FCFS) 
policies with different numbers of redundant threads. In |[29l . 
a delay upper bound was obtained for FCFS policies under 
Poisson arrivals and arbitrary downloading time distribution, 
which was further used to derive a sub-optimal solution for 
jointly minimizing latency and storage cost. In l30l , the 
authors established delay bounds for the classes of FCFS, 
preemptive and non-preemptive priority scheduling policies, 
when the downloading time is i.i.d. exponential. In f3ll . the 
authors studied when redundant threads can reduce delay (and 
when not), and designed optimal redundant thread scheduling 
policies among the class of FCFS policies. 

The second line of researches 021 - 041 focus on large- 
scale storage clouds with a large number of storage nodes, 
where the delay performance is constrained by the available 
networking resources of the system. In P2l . P3l , the authors 
measured the chunk downloading time over the Amazon cloud 
storage system and proposed to adapt code parameters and the 
number of redundant threads to reduce delay. In P4l . it was 
shown that FCFS with redundant thread assignment is delay- 
optimal among all online policies, under the assumptions of a 
single storage code, high storage redundancy and exponential 
downloading time distribution. Following this line of research, 
in this paper, we consider the more general scenarios with 
heterogonous storage codes, general level of storage redun¬ 
dancy, and non-exponential downloading time distributions, 
where neither FCFS nor priority scheduling is close to delay- 
optimal. 

III. System Model 

We consider a cloud storage system that is composed of 
one frond-end proxy server and a large number of distributed 
storage devices, as illustrated in Fig. |T| The proxy server 
enqueues the user requests and establishes TCP connections 
to fetch data from the storage devices. In practice, the proxy 















3 


Storage Devices 



server also performs tasks such as format conversion, data 
compression, authentication and encryption^ 

A. Data Storage and Retrieval 

Suppose that the file corresponding to request i is stored 
with an (n,, k ,) MDS cotle0 Then, file i is partitioned into ki 
equal-size chunks, which are encoded into nt coded chunks 
and stored in rii distinct devices. In MDS codes, any ki out of 
the rii coded chunks are sufficient to restore file i. Therefore, 
the cloud storage system can tolerate rii — ki failures and 
still secure file i. Examples of MDS codes include repetition 
codes (ki = 1) and Reed-Solomon codes. Let di denote the 
Hamming distance of an (rii. k t ) MDS code, determined by 

di = rii — h + 1. (1) 

The minimum code distance of all storage codes is defined as 

dmin = mm{di,i = 1,2, • • • }. (2) 

It has been reported in El, fim-E! that the downloading 
time of data chunks can be highly unpredictable in storage 
clouds. Some recent measurements Il32l - l34l on Amazon 
AWS show that the downloading times of data chunks stored 
with distinct keys can be approximated as independent and 
identically distributed ( i.i.d .) random variables. In this paper, 
we assume that the downloading times of data chunks are 

Li.dE as in E21-EH), ED, GS- 

B. Redundant and Parallel Thread Scheduling 

The proxy server has L downloading threads, each repre¬ 
senting a potential TCP connection, to retrieve data from the 
distributed storage devices. The value of L is chosen as the 
maximum number of simultaneous TCP connections that can 
occupy all the available networking bandwidth without sig¬ 
nificantly degrading the latency of each individual connection 
G2, ED. a decision-maker at the proxy server determines 
which chunks to download and in what order for the L threads 
to minimize the average data retrieving delay. 

Suppose that a sequence of N read requests arrive at the 
queue of the processing server@ Let a* and Ci )7r denote the 

2 Our results can be also used for systems with multiple proxy servers, where 
each read request is routed to a proxy server based on geometrical location, 
or determined by a round robin or random load balancing algorithm. More 
complicated load balancing algorithms will be studied in our future work. 

3 The terms “file” and “request” are interchangeable in this paper. 

4 This assumption is reasonable for large-scale storage clouds, e.g., Ama¬ 
zon AWS, where individual read operations may experience long latency 
events, such as network congestion, cache misses, database blocking, high 
temperature or high I/O traffic of storage devices, that are unobservable and 
unpredictable by the decision-maker. 

5 The value of N can be either finite or infinite in this paper. If N tends to 
infinite, a limsup operator is enforced on the right hand side of 0. 


arrival and completion times of the ith request under policy 
7 r, respectively, where Q = ai < a?, <■■■< Un- Thus, 
the service latency of request i is given by e l 7r — a ,, which 
includes both the downloading time and the waiting time 
in the request queue. We assume that the arrival process 
(ai,a 2) • • •) is an arbitrary deterministic time sequence, while 
the departure process (ci j7r , c 2 , 7 t, • • ■) is stochastic because of 
the random downloading time. Given the request parameters 
N and ( ai,ki,rii)fL 1 , the average flow time of the requests 
under policy tt is defined as 

1 N 

Dn = N XJ( E { c b7r} -a*), (3) 

where the expectation is taken with respect to the random 
distribution of chunk downloading time for given policy 7 r 
and for given request parameters N and (ai,ki,rii)f =l . The 
goal of this paper is to design low-complexity online thread 
scheduling policies that achieve optimal or near-optimal delay 
performance. 

Definition 1. Online policy: A scheduling policy is said 
to be online if, at any given time t, the decision-maker 
does not know the number of requests to arrive after time 
t, the parameters ( a,i,ki,rii ) of the requests to arrive, or 
the realizations of the (remaining) downloading times of the 
chunks that have not been accomplished by time t. 

Definition 2. Delay-optimality: A thread scheduling policy 7 r 
is said to be delay-optimal if, for any given request parameters 
N and (a*, ki, ni)^. lt it yields the shortest average flow time 
among all online policies. 

A key feature of this scheduling problem is the flexibility 
of redundant and parallel thread scheduling. Take file i as an 
example. When m > ki, one can assign redundant threads 
to download more than ki chunks of file i. The first />:, suc¬ 
cessfully downloaded chunks are sufficient for completing the 
read operation. After that, the extra downloading threads are 
terminated immediately, which is called service termination. 
By doing this, the retrieving delay of file i is reduced. On 
the other hand, redundant thread scheduling may cause extra 
system load. Therefore, such a policy provides a tradeoff 
between fast retrieving of each file and a potentially longer 
service latency due to the extra system load, which makes it 
difficult to achieve delay-optimality. 

C. Sendee Preemption and Work Conserving 

We consider chunk-level preemptive and non-preemptive 
policies. When preemption is allowed, a thread can switch 
to serve another chunk at any time, and resume to serve the 
previous chunk at a later time, continuing from the inter¬ 
rupted point. When preemption is not allowed, a thread must 
complete (or terminate) the current chunk before switching to 
serve another chunk. We assume that service terminations and 
preemptions are executed immediately with no extra delay. 

Definition 3. Work-conserving: A scheduling policy is said 
to be work-conserving if all threads are kept busy whenever 
there are chunks waiting to be downloaded. 
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Remark 1: If preemption is allowed, a delay-optimal policy 
must be work-conserving, because the average delay of any 
non-work-conserving policy can be reduced by assigning the 
idle threads to download more chunks. Meanwhile, if pre¬ 
emption is not allowed, a work-conserving policy may not be 
delay-optimal, because the occupied threads cannot be easily 
switched to serve an incoming request with a higher priority. 

IV. Exponential Chunk Downloading Time 

In this section, we study the delay-optimal thread scheduling 
when chunk downloading time is i.i.d. exponentially dis¬ 
tributed with mean 1/p. Non-exponential downloading time 
distributions will be investigated in Section [V] 

A. High Storage Redundancy, Preemption is Allowed 

We first consider the case of high storage redundancy such 
that d m i n > L is satisfied. In this case, we have n — (fc,— 1) > 
L for all i. Hence, each file i has at least L available chunks 
even if k, t — 1 chunks of file i have been downloaded. Hence, 
each unfinished request has sufficient available chunks such 
that all L threads can be simultaneously assigned to serve this 
request. 

Let Sj denote the arrival time of the jth arrived chunk 
downloading task of all files and t 7 denote the completion time 
of the jth downloaded chunk of all files. The chunk arrival 
process (si, S 2 ,...) is uniquely determined by the request 
parameters ( cti,ki)/L 1 . Meanwhile, the chunk departure pro¬ 
cess (fi, f 2 5 • ■ -) satisfies the following invariant distribution 
property: 

Lemma 1. [34] Theorem 6.4] Suppose that (i) d. tn - u , > L 

and (ii) the chunk downloading time is i.i.d. exponentially 
distributed with mean 1/p. Then, for any given request pa¬ 
rameters N and ( a,i,ki,rii)fL 1 , the distribution of the chunk 
departure process (ti,t. 2 , ■ ■ ■) is invariant under any work- 
conserving policy. 

We propose a preemptive Shortest Expected Remaining 
Processing Time first policy with Redundant thread assignment 

(preemptive SERPT-R): 

Suppose that, at any time t, there are V unfinished requests 
*l>* 2 ) ■ • • >*v> such that ctj chunks need to be downloaded 
for completing request ij. Under SERPT-R, each idle thread 
is assigned to serve one available chunk of request ij with 
the smallest ctj. (Due to storage redundancy, the number of 
available chunks of request ij is larger than aj.) If all the 
available chunks of request ij are under service, then the idle 
thread is assigned to serve one available chunk of request ij' 
with the second smallest aj>. This procedure goes on, until all 
L threads are occupied or all the available chunks of the V 
unfinished requests are under services. 

This policy is an extension of Shortest Remaining Process¬ 
ing Time first (SRPT) policy 031, US) because it schedules 
parallel and redundant downloading threads to serve the re¬ 
quests with the least workload. The following theorem shows 
that this policy is delay-optimal under certain conditions. 


Theorem 1. Suppose that (i) d ]n \ n > L, (ii) preemption 
is allowed, and (iii) the chunk downloading time is i.i.d. 
exponentially distributed with mean 1/p. Then, for any given 
request parameters N and ( cti , ki, preemptive SERPT- 

R is delay-optimal among all online policies. 

Remark 2: Theorem |T| and the subsequent theoretical 
results of this paper are difficult to establish for the following 
reasons: 1) Each request i is partitioned into a batch of ki 
chunk downloading tasks, and the processing time of each 
task is random. 2) There are rii — ki redundant chunks for 
request i, such that completing any ki of the rii tasks would 
complete the request. 3) The system has L threads which can 
simultaneously process L tasks belonging to one or multiple 
requests. 4) If redundant downloading threads are scheduled, 
the associated extra system load must be considered when 
evaluating the delay performance. 

Proof: We provide a proof sketch of Theorem[j] Consider 
an arbitrarily given chunk departure sample path (ti,t 2 , ■ ■ •)• 
According to the property of the SRPT principle fl5l . lfl6l . 
preemptive SERPT-R minimizes YliLi ( c i, 7 r — o-i) for any 
given sample path (£i, £ 2 , ■ • ■)■ Further, Lemma Q] tells us that 
the distribution of (ti,t 2 ,.--) is invariant among the class 
of work-conserving policies. By this, preemptive SERPT-R 
is delay-optimal among the class of work-conserving policies. 
Finally, since a delay-optimal policy must be work-conserving 
when preemption is allowed. Theorem [T| follows. More details 
are provided in Appendix lAl ■ 

In Theorem 6.4 of 134), it was shown that a First-Come- 
First-Served policy with Redundant thread assignment (FCFS- 
R) is delay-optimal when ki = k for all i and > L. 

In this case, preemptive SERPT-R reduces to the following 
policy: After a request departs from the system, pick any 
waiting request (not necessarily the request arrived the earliest) 
and assign all L threads to serve the available chunks of this 
request until it departs. Hence, FCFS-R belongs to the class of 
SERPT-R policies, and Theorem 6.4 of l34l is a special case 
of Theorem [T] 

B. General Storage Redundancy, Preemption is Allowed 

When d m ; n < L, some requests may have less than L 
available chunks, such that not all of the L threads can be 
assigned to serve it. In this case, SERPT-R may not be delay- 
optimal. This is illustrated in the following example. 

Example 1. Consider two requests with parameters given as 
(fci = 1, ni = 4, d\ = 4, ai = 0) and (k ,2 = 2, 11,2 = 2, = 

1, a 2 = 0). The number of threads is L = 4. Under SERPT-R, 
all 4 threads are assigned to serve request 1 after time zero. 
However, after request 1 is completed, the chunk downloading 
rate is reduced from 4 p to 2 p, because request 2 only has 
ri 2 = 2 chunks. Furthermore, after one chunk of request 2 is 
downloaded, the chunk downloading rate is reduced from 2 p 
to p. The average flow time of SERPT-R is D serptr = 1/p 
seconds. 

We consider another policy Q: after time zero, 2 threads 
are assigned to serve request 1 and 2 threads are assigned 
to serve request 2. After the first chunk is downloaded, if 
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the downloaded chunk belongs to request 1, then request 1 
departs and 2 threads are assigned to serve request 2. If 
the downloaded chunk belongs to request 2, then 3 threads 
are assigned to serve request 1 and 1 thread is assigned to 
serve request 2. After the second chunk is downloaded, only 
one request is left and the threads are assigned to serve the 
available chunks of this request. The average flow time of 
policy Q is Dq = 61/(64 p) seconds. Hence, SERPT-R is not 
delay-optimal. 

Next, we bound the delay penalty associated with removing 
the condition d m j n > L. 


Theorem 2. If (i) preemption is allowed and (ii) the chunk 
downloading time is i.i.d. exponentially distributed with 
mean 1 / p. Then, for any given request parameters N and 
(cti, ki, nfj/Lp, the average flow time of preemptive SERPT-R 
satisfies 


L -1 


Dopt < Dprmp, SERPT-R < Dopt + — y, 


l — drr 


where d m - ln is defined in ©. 


(4) 


Proof: Here is a proof sketch of Theorem [2] We first use 
a state evolution argument to show that, after removing the 
condition d m i n > L, SERPT-R needs to download L — d m ; n 
or fewer additional chunks after any time t, so as to accomplish 
the same number of requests that are completed by SERPT-R 
with the condition d m ; n > L during (0, t]. Further, according 
to the properties of exponential distribution, the average time 
for the system to download L — d, n \ n extra chunks under the 
conditions of Theorem |2] is upper bounded by the last term of 
©. This completes the proof. See Appendix iBl for the details. 


Note that if d m i n > L, the last term in © becomes zero 
which corresponds to the case of Theorem [l] if d m ; n < L, 
the last term in © is upper bounded by - ln((/—!-) + 1 . 
Therefore, the delay penalty caused by low storage redundancy 
is of the order 0(\nL/p), and is insensitive to increasing L. 
Further, this delay penalty remains constant for any request 
arrival process and system traffic load. 


C. High Storage Redundancy, Preemption is Not Allowed 

Under preemptive SERPT-R, each thread can switch to serve 
another request at any time. However, when preemption is not 
allowed, a thread must complete or terminate the current chunk 
downloading task before switching to serve another request. 
In this case, SERPT-R may not be delay-optimal, as illustrated 
in the following example. 

Example 2. Consider two requests with parameters given as 
(ki = 2,ni = 3,di = 2,ai = 0) and (k 2 = l,ri 2 = 2,^2 = 
2, <22 = e), where £ > 0 can be arbitrarily close to zero. 
The number of threads is L = 2, the chunk downloading 
time is i.i.d. exponentially distributed with mean 1/p. Under 
SERPT-R, the two threads are assigned to serve request 1 after 
time zero. After the first chunk is downloaded, one thread is 
assigned to serve request 2 and the other thread remains to 


serve request 1. After the second chunk is downloaded, one of 
the requests has departed, and the two threads are assigned to 
serve the remaining request. The average flow time of SERPT- 
R is D serpt-r — 5/(4 p) — e/2 seconds. 

We consider another non-preemptive policy Q: the threads 
remain idle until time e. After e, the two threads are assigned 
to serve request 2. After the first chunk is downloaded, request 
2 has departed. Then, the two threads are assigned to serve 
request 1, until it departs. The average flow time of policy 
Q is Dq = 1 /p + e/2 seconds. Since e is arbitrarily small, 
SERPT-R is not delay-optimal when preemption is not allowed. 

We propose a non-preemptive Shortest Expected Differ¬ 
ential Processing Time first policy with Redundant thread 
assignment (non-preemptive SEDPT-R), where the service 
priority of a file is determined by the difference between the 
number of remaining chunks of the file and the number of 
threads that has been assigned to the file. 

Suppose that, at any time t, there are V unfinished requests 
ii, 22 ,..., iy, such that aj chunks need to be downloaded 
for completing request ij at time t and Sj threads have been 
assigned to serve request ij. Under non-preemptive SEDPT- 
R, each idle thread is assigned to serve one available chunk 
of request ij with the smallest aj — Sj. (Due to storage 
redundancy, the number of available chunks of request ij is 
larger than aj. Hence, it may happen that aj—Sj < 0 because 
of redundant chunk downloading.) If all the available chunks 
of request ij are under service, then the idle thread is assigned 
to serve one available chunk of request ij / with the second 
smallest aj' —Sj'. This procedure goes on, until all L threads 
are occupied or all the available chunks of the V unfinished 
requests are under services. 

The intuition behind non-preemptive SEDPT-R is that Sj 
chunks of request ij will be under service after time t for 
any non-preemptive policy, and thus should be excluded when 
determining the service priority of request ij. This is different 
from the traditional SRPT-type policies fT31 - fT9l . which do 
not exclude the chunks under service when determining the 
service priorities of the requests. The delay performance of 
this policy is characterized in the following theorem: 

Theorem 3. Suppose that (i) d, ui \ u > L, (ii) preemption is 
not allowed, and (iii) the chunk downloading time is i.i.d. 
exponentially distributed with mean 1/p. Then, for any given 
request parameters N and (a,i,ki,ni)fLj, the average flow 
time of non-preemptive SEDPT-R satisfies 

Dopt [ DnoH-prmp,SEDPT-R 1' Uopt A 1/p. (5) 

Proof: We provide a proof sketch of Theorem^ Theorem 
H] tells us that preemptive SERPT-R provides a lower bound of 
D 0 pt. On the other hand, non-preemptive SEDPT-R provides 
an upper bound of D opt . Thus, we need to show that the 
delay gap between preemptive SERPT-R and non-preemptive 
SEDPT-R is at most 1/p. Towards this goal, we use a state 
evolution argument to show that for any time t and any given 
sample path of chunk departures ■ ■ ■), non-preemptive 

SEDPT-R needs to download L or fewer additional chunks 
after time t, so as to accomplish the same number of requests 
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that are completed under preemptive SERPT-R during (0, t]. 
By the properties of exponential distribution, the average time 
for the L threads to download L chunks under non-preemptive 
SEDPT-R is 1 / p, and Theorem [3] follow's. See Appendix O for 
the details. ■ 

Theorem [3] tells us that the delay gap between non- 
preemptive SEDPT-R and the optimal policy is at most the 
average downloading time of one chunk by each thread, i.e., 
1 / p. Intuitively speaking, this is because each thread only 
needs to wait for downloading one chunk, before switching 
to serve another request. However, the proof of Theorem [3] is 
non-trivial, because it must work for any possible sample path 
of the downloading procedure. 


D. General Storage Redundancy, Preemption is Not Allowed 

When preemption is not allowed and the condition d r ,,\ u > 
L is removed, we have the following result. 


Theorem 4. Suppose that (i) preemption is not allowed, 
and (ii) the chunk downloading time is i.i.d. exponentially 
distributed with mean 1/p. Then, for any given request pa¬ 
rameters N and {ai,ki,rii)f =1 , the average flow time of non- 
preemptive SEDPT-R satisfies 


_ _ _ ^ ^ ^ i i 

Dopt f D n0 n-prmp,SEDPT-R — k) 0 pl H f ^ ^ y, (6) 

T P l-d . L 

l — u-rnin 

where d m ; n is defined in ©. 

Proof: See Appendix 151 ■ 

If rfmin > L, the last term in <[©) becomes zero which 
corresponds to the case of Theorem [3] 


V. Non-Exponential Chunk Downloading Time 

In this section, we consider two classes of general down¬ 
loading time distributions: New-Longer-than-Used (NLU) dis¬ 
tributions and New-Shorter-than-Used (NSU) distributions, 
defined as follows^ 

Definition 4. New-Longer-than-Used distributions: A distri¬ 
bution on [0, oo) is said to be New-Longer-than-Used (NLU), 
if for all t, t > 0 and P(V > r) > 0, the distribution satisfies 

P(X > t) > P(X > t + t\X > t). (7) 

New-Shorter-than-Used distributions: A distribution on 
[0, oo) is said to be New-Shorter-than-Used (NSU), if for all 
t, t > 0 and P(X > r) > 0, the distribution satisfies 

P(V > t) < ¥{X > t + t\X > t). (8) 

NLU (NSU) distributions are closely related to log-concave 
(log-convex) distributions. Many commonly used distributions 
are NLU or NSU distributions (35]. In practice, NLU distri¬ 
butions can be used to characterize the scenarios where the 

6 Note that New-Longer-than-Used (New-Shorter-than-Used) is equivalent 
to the term New-Better-than-Used (New-Worse-than-Used) used in reliability 
theory ED, ED, where “better” means a longer lifetime. However, this may 
lead to confusion in the current paper, where “better” means a shorter delay. 
We choose to use New-Longer-than-Used (New-Shorter-than-Used) to avoid 
confusion. In a recent work EO, the New-Longer-than-Used (New-Shorter- 
than-Used) property was termed light-e very where (heavy-every where). 


downloading time is a constant value followed by a short 
latency tail. For instance, recent studies 1:32|, |33J suggest 
that the data downloading time of Amazon AWS can be 
approximated as a constant delay plus an exponentially dis¬ 
tributed random variable, which is an NLU distribution. On 
the other hand, NSU distributions can be used to characterize 
occasional slow responses resulting from TCP retransmissions, 
I/O interference, database blocking and/or even disk failures. 

We will require the following definitions: Let x = 
(xi,x 2 , ■ • •, x m ) and y = (t/i, y 2 , ■ ■■, y m ) be two vectors in 
R m , then we denote x < y if Xi < yi for i = 1, 2,..., m. 

Definition 5. Stochastic Ordering: (36l Let X and Y be two 

random variables. Then, X is said to be stochastically smaller 
than Y (denoted as X < st Y), if 

P(X >t)< P(y > t) for all t G R. (9) 

Definition 6. Multivariate Stochastic Ordering: (361 A set 

U C R m is called upper if y G U whenever y>x and x G U. 
Let X and Y be two random vectors. Then, X is said to be 
stochastically smaller than Y (denoted as X < st Y). if 

P(X G U) < P(y G U) for all upper sets U C R m . (10) 

Stochastic ordering of stochastic processes (or infinite vec¬ 
tors) can be defined similarly (36]. 

A. NLU Chunk Downloading Time Distributions 

We consider a non-preemptive Shortest Expected Differen¬ 
tial Processing Time first policy with No Redundant thread 
assignment (non-preemptive SEDPT-NR): 

Suppose that, at any time t, there are V unfinished requests 
ii, i 2 , ■ ■ ■, iv> such that aj chunks need to be downloaded 
for completing request ij at time t and Sj threads have been 
assigned to serve request ij. Under non-preemptive SEDPT- 
NR, each idle thread is assigned to serve one available chunk 
of request ij with the smallest aj — 6j. If aj threads have 
been assigned to request ij, then the idle thread is assigned 
to serve one available chunk of request iji with the second 
smallest aj/ — Sj'. This procedure goes on, until all L threads 
are occupied or each request ij is sen’ed by aj threads. 

Note that since at most aj threads are assigned to re¬ 
quest ij, we have aj — Sj > 0 for all ij under non- 
preemptive SEDPT-NR. SEDPT-NR is a non-work-conserving 
policy. When preemption is allowed, the delay performance of 
SEDPT-NR can be improved by exploiting the idle threads to 
download redundant chunks. This leads to a preemptive Short¬ 
est Expected Differential Processing Time first policy with 
Work-Conserving Redundant thread assignment (preemptive 
SEDPT-WCR): 

Upon the decision of SEDPT-NR, if each request ij is served 
by aj threads and there are still some idle threads, then assign 
these threads to download some redundant chunks to avoid 
idleness. When a new request arrives, the threads downloading 
redundant chunks will be preempted to serve the new arrival 
request. 

Let us consider the service time for a thread to complete 
downloading one chunk. If the thread has spent r seconds 
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on one chunk, the tail probability for completing the current 
chunk under service is P(Af > t + r \X > r). On the other 
hand, the tail probability for switching to serve a new chunk 
is P(Jf > t). Since the chunk downloading time is i.i.d. NLU, 
it is stochastically better to keep downloading the same chunk 
than switching to serve a new chunk. 

Lemma 2. Suppose that (i) the system load is high such 
that ail L threads are occupied at all time t > 0 and (ii) 
the chunk downloading time is i.i.d. NLU. Then, for any 
given request parameters N and (a^, hi, the chunk 

departure instants (£i, £ 2 > • ■ •) under non-preemptive SEDPT- 
NR are stochastically smaller than those under any other 
online policy. 

Proof: See Appendix [E] ■ 

Lemma 3. Suppose that (i) the system load is high such 
that all L threads are occupied at all time t > 0, (ii) 
preemption is not allowed, and (Hi) the chunk downloading 
time is i.i.d. NLU. Then, for any given request parameters N 
and ( 1 ai,ki,rii)fLi, the average flow time of non-preemptive 
SEDPT-NR satisfies 


Dap, < D 


Lon-prmp, SEDPT-NR f E) opt + IE 


max Xi 


n 1 1 


where the Xfs are i.i.d. chunk downloading times. 

Proof: See Appendix [F] ■ 

If the average chunk downloading time is E{X;} = 1/p, 
then the last term in dTTb is bounded by 


— < E / max All < — , (12) 

p \ 1 = 1 ,...,l J p l 


where the lower bound is trivial, and the upper bound follows 
from the property of New-Longer-than-Used distributions in 
Proposition 2 of J37). Therefore, the delay gap in Lemma [3] 
is no more than (InL + 1 )/p. Next, we remove condition (i) 
in Lemma [3] and obtain the following result. 

Theorem 5. Suppose that (i) preemption is not allowed and 
(ii) the chunk downloading time is i.i.d. NLU. Then, for any 
given request parameters N and (dj, ki, the average 

flow time of non-preemptive SEDPT-NR satisfies 

D op t % D non .p rm p SEDPT-l vr f D op , 

+E 1 max Xi i + E1 max Xi i , (13) 

(Z=1....,L J (z=i,...,l-i J 

where the Xfs are i.i.d. chunk downloading times. 

Proof: See Appendix [G] ■ 

When preemption is allowed, preemptive SEDPT-WCR can 
achieve a shorter average delay than non-preemptive SEDPT- 
NR. In this case, we have the following result. 

Theorem 6. Suppose that (i) preemption is allowed and (ii) 
the chunk downloading time is i.i.d. NLU. Then, for any given 
request parameters N and (a,i,ki,ni)/L. v the average flow 


time of preemptive SEDPT-WCR satisfies 

D op t < Dprint,SEDPT-WCR < D opt 

+E i max Xi 1 + E i max Xi 1, (14) 

(i=i....,L J [;=i,...,l-i J 

where the Xfis are i.i.d. chunk downloading times. 

Proof: See Appendix IH1 ■ 

Similar to Lemma [3] the delay gaps in Theorems [5] and [6] 
are also of the order 0(\nL/p). 


B. NSU Chunk Downloading Time Distributions 

If the chunk downloading time is i.i.d. NSU, one can 
show that it is stochastically better to switch to a new chunk 
than sticking to downloading the same chunk. We consider 
the scenario that preemption is not allowed and obtain the 
following result. 

Lemma 4. Suppose that (i) d m \ n > L, (ii) ki = 1 for 
all i, (iii) preemption is not allowed, and (iv) the chunk 
downloading time is i.i.d. NSU. Then, for any given request 
parameters N and ( a,i,ki = l,Tii)£Li.i the chunk departure 
instants (ti, t 2 , ■ ■ ■ , fjv) under non-preemptive SEDPT-R are 
stochastically smaller than those under any other online 
policy. 

Proof: See Appendix U ■ 

Theorem 7. Suppose that (i) d m > L, (ii) ki = 1 for all i, 
(iii) preemption is not allowed, and (iv) the chunk downloading 
time is i.i.d. NSU. Then, for any given request parameters N 
and ( a,i,ki = 1 ,nf)/L 1 , non-preemptive SEDPT-R is delay- 
optimal among all online policies. 

Proof: See Appendix Q] ■ 

A special case of Theorem [7] was obtained in Theorem 3 of 
ED, where delay-optimality was shown only for high system 
load such that all L threads are occupied at all time. 

VI. Numerical Results 

We present some numerical results to illustrate the delay 
performance of different scheduling policies and validate the 
theoretical results. All these results are averaged over 100 
random samples for the downloading times of data chunks. 


A. Exponential Chunk Downloading Time Distributions 

Consider a system with N = 3000 request arrivals, among 
which pi = 90% of the requested files are stored with a 
(ni,ki,d\) = (3,1,3) repetition code, and P 2 = 10% of 
the requested files are stored with a ( 712 , &2, ^ 2 ) = (14,10, 5) 
Reed-Solomon code. Therefore, d. ui \ n = 3. The code pa¬ 
rameters are drawn at random, i.i.d. from these two classes. 
The inter-arrival time of the requests is i.i.d. distributed as a 
mixture of exponentials: 

f Exponential(rate = 0.5A) with probability 0.99; 

( Exponential(rate = 50.5A) with probability 0.01. 




(a) Preemption is allowed, d m i n = L = 3 


(b) Preemption is not allowed, d m j n = L = 3 




(c) Preemption is allowed, d m [ n < L = 5 


(d) Preemption is not allowed, <f m i n < L = 5 


Fig. 2: Average flow time D„ versus traffic intensity p, where the chunk downloading time is i.i.d. exponentially distributed. 


The average chunk downloading time is 1/p = 0.02s. The 
traffic intensity p is determined by 

(pik 1 +p 2 k 2 )X 

P= - 7 -• (15) 

Lp 

Figures [2fa)-(d) illustrate the numerical results of average 
flow time D n versus traffic intensity p for 4 scenarios where 
the chunk downloading time is i.i.d. exponentially distributed. 
One can observe that SERPT-R and SEDPT-R have shorter av¬ 
erage flow times than the First-Come-First-Served policy with 
Redundant thread assignment (FCFS-R) Il34l . If L = d m i n = 3 
and preemption is allowed, by Theorem[Q preemptive SERPT- 
R is delay-optimal. For the other 3 scenarios, upper and lower 
bounds of the optimum delay performance are plotted. By 
comparing with the delay lower bound, we find that the extra 
delay caused by non-preemption is 0.0114s which is smaller 
than 1/p = 0.02s, and the extra delay caused by d m - m < L is 
0.0034s which is smaller than - Vf”, 1 \ = 0.0117s. These 

results are in accordance with Theorems [T||4] 

B. NLU Chunk Downloading Time Distributions 


Figure [3] illustrates the average flow time versus traffic 
intensity p when L = 3 and the chunk downloading time 
is i.i.d. NLU. As expected, preemptive SEDPT-WCR has a 
shorter average delay than non-preemptive SEDPT-NR. In the 
preemptive case, the delay performance of SEDPT-WCR is 
much better than those of non-preemptive SEDPT-R and the 
First-Come-First-Served policy with Work-Conserving Redun¬ 
dant thread assignment (FCFS-WCR). Therefore, preemptive 
SEDPT-WCR and non-preemptive SEDPT-NR are appropriate 
for i.i.d. NLU downloading time distributions. By comparing 
with the delay lower bound, we find that the maximum 
extra delays of preemptive SEDPT-WCR and non-preemptive 
SEDPT-NR are 0.0229s and 0.0230s, respectively. Both of 
them are smaller than the delay gap in Theorems 0 and [6] 
whose value is 0.0560s. 

C. NSU Chunk Downloading Time Distributions 

For NSU distributions, we consider that all N = 3000 
requested files are stored with a (ni,ki,di) = (3,1,3) 
repetition code. The chunk downloading time X is chosen 
i.i.d. as a mixture of exponentials: 


For the NLU distributions, the system setup is the same 
with that in the previous subsection. We assume that the 
chunk downloading time X is i.i.d. distributed as the sum of a 
constant and a value drawn from an exponential distribution: 


Pr(X > x) 


1 , 

exp 




ifa:<M; 

if x > 


0.4 


— /< > 


(16) 


which was proposed in |32| , ll33l to model the data down¬ 
loading time in Amazon AWS system. The traffic intensity p 
is also given by (IT5l ). 


f Exponential(rate = OAp) with probability 0.5; 

( Exponential(rate = 1.6 p) with probability 0.5. 

Under SEDPT-R, the average time for completing one chunk 
is E (minj = i i ... ^ X/}, where the X;’s are i.i.d. chunk down¬ 
loading times. Therefore, the traffic intensity p is 


P 


AE 


{ min Xi 

1=1,— ,L 


(17) 


Figure [4] shows the average flow time D n versus traffic 
intensity p where L = 3, preemption is not allowed, and the 
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Fig. 3: Average flow time D„ versus traffic intensity p, where 
the chunk downloading time is i.i.d. NLU. 



Fig. 4: Average flow time D„ versus traffic intensity p, where 
the chunk downloading time is i.i.d. NSU. 


chunk downloading time is i.i.d. NSU. In this case, SEDPT- 
R is delay-optimal. We observe that the delay performance 
of SEDPT-WCR is quite bad and the delay gap between 
SEDPT-R and SEDPT-WCR is unbounded. This is because 
SEDPT-WCR has a smaller throughput region than SEDPT-R. 
Therefore, SEDPT-R is appropriate for i.i.d. NSU downloading 
time distributions. 

VII. Conclusions 

In this paper, we have analytically characterized the delay- 
optimality of data retrieving in distributed storage systems with 
multiple storage codes. Low-latency thread scheduling policies 
have been designed by combining the advantages of SERPT 
in the preemptive case (or SEDPT in the non-preemptive case) 
and redundant thread assignment. Under several important 
settings, we have shown that the proposed policies are either 
delay-optimal or within a constant gap from the optimum delay 
performance. 

There are several important open problems concerning the 
analytical characterization of data retrieving delay: 

• What is the optimal policy for other classes of non¬ 
exponential service distributions? 

• What is the optimal policy when the service time distri¬ 
butions are heterogeneous across data chunks? 

• What is the optimal policy when latency and downloading 
cost need to be jointly considered? 

• How to design low-latency policies under delay metrics 
other than average flow time? 
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Appendix A 
Proof of TheoremQ] 

First, consider an arbitrarily given sample path of chunk de¬ 
partures {t\,t 2 , ■ ■ ■). According to the conditions of Theorem 
[Q the request parameters N and (aij, fcj, n,)^ =1 are fixed. Then, 
the request completion times (ci j7r , C 2 >7r ,...) of a policy n are 
determined by which request each departed chunk belongs. Let 
D w (ti,t 2 , ■ ■ •) = If LtL( c i, 7 r ~ a i) denote the sample-path 
average delay of policy 7 r for given request parameters N, 
and chunk departures (ti,t 2 , ■ ■ ■). According to 
the SRPT discipline lfl5ll . fl6l . D n (ti, t 2 ,...) is minimized 
if each downloaded chunk belongs to the request with the 
fewest remaining chunks. This is satisfied by preemptive 
SERPT-R under the conditions of Theorem [7] because all L 
threads are assigned to the request with the fewest remaining 
chunks. Therefore, for any given chunk departures (ti,t 2 ,...), 
preemptive SERPT-R minimizes D 7T (ti,t 2 ,...), i.e., 

-Dserpt-r^i, t 2 , ■■■) = ruin D n (ti , t 2 ,...). (18) 

7T 

Let F n (ti,t 2 ,...) denote the cumulative distribution func¬ 
tion of the chunk departure process (t\,t 2 ,...) under policy 
7T. Then, the average delay of policy 7r can be expressed as 

D n = J D n (ti,t 2 ,...)dF n (ti,t 2 ,...). (19) 

According to Lemma Q] any two work-conserving policies 7Ti 
and 7T2 satisfy 

F ni (t 1 ,t 2 , ■ • •) = F„ 2 (ti,t 2 ,...), V ■ • •)■ (20) 

Using (IT8] >- (l2()t and the fact that preemptive SERPT-R is 
a work-conserving policy, we can obtain for any work- 
conserving policy 7r that 

Ar 

= J D w (t 1 ,t 2 ,...)dF„(ti,t 2 ,...) 

> [ DsERPT-R{tl,t 2 , . . .)dF n (ti,t 2 , . . .) 

-DseRPT-r(U, t 2 , . . .)cLFsERPT-R(il,f2, • ■ •) 

= Pserpt-r- (21) 

Hence, preemptive SERPT-R is delay-optimal among the 
class of work-conserving policies. Finally, when preemption 
is allowed, a delay-optimal policy must be work-conserving. 
Hence, Theorem Q] follows. 


Appendix B 
Proof of Theorem[2] 

The case of L < d rmu was studied in Theorem [j] and we 
only need to consider the case of L > d ui - ]u . For notational 
simplicity, we use policy P to denote preemptive SERPT-R 
with L > d m i n , and policy Q to denote preemptive SERPT- 
R under the conditions of Theorem [j] where L < d m \ n 
holds. In particular, policy P is under the request parameters 
N and (a*, fcj, ni)f =l such that there exists an integer j 
(1 < j < N ) satisfying L > rij — kj + 1, and policy Q 
has some “virtual” chunks such that it is under the request 
parameters N and (ai,ki,n' i )^L 1 satisfying L < n[ — ki + 1 
for all* = 1,2,..., TV. 

When L > dmin, the optimal policy of Theorem [2] can 
be an non-work-conserving policy under the conditions of 
Theorem[U because there can be less than L available chunks 
to download. By Theorem[Q policy Q provides a lower bound 
of D opt . On the other hand, policy P provides an upper bound 
of D opt . The remaining task is to evaluate the delay gap 
between policy P and policy Q when L > d m i n . 

First, we construct the chunk departure sample paths 
(ti,t 2 ,...) of policy P and policy Q. Let (t \, t l 2 ,...) denote 
the chunk departure time sequences of thread l, such that 
the inter-departure time rj = f( +1 — td is i.i.d. exponentially 
distributed with rate /i. Under policy P, the chunk departure 
time sequences (ti,t 2 ,...) is obtained by taking the union 
U^Li {t \, ^ 2 ; ■ • ■) an d deleting the chunk departures during the 
idle periods of each thread l under policy P. (Under policy 
P, the idle periods are different across the threads.) Since 
the chunk service time is memoryless, deleting some chunk 
departures will not affect the service time distribution of other 
chunks. Under policy Q, the chunk departure time sequences 
• ■ •) is obtained by taking the union • • ■)> 

and deleting the chunk departures when all L threads are idle 
under policy Q. (Under policy Q, all L threads are active 
or idle at the same time.) By this, we obtain two chunk 
departure sample paths of policy P and policy Q with the 
same probability to occur. 

In the sequel, we will show that for any time t and chunk 
departure sample paths of policy P and policy Q constructed 
above, policy P needs to download L — d m i n or fewer 
additional chunks after time t, so as to accomplish the same 
number of requests that are completed under policy Q during 

(0,4 

Definition 7. fl6l The state of the system is specified by 
an infinite vector a = (cci, 0 ( 2 ,...) with non-negative, non¬ 
increasing components. At any time, the coordinates of a 
are interpreted as follows: ot\ is the maximum number of 
remaining chunks among all requests, 0:2 is the next greatest 
number of remaining chunks among all requests, and so on, 
with duplications being explicitly repeated. Suppose that there 
are l unfinished requests in the system, then 

ot\ > a 2 > ... > ai > 0 = ai+i = a i+2 = - (22) 

The key step for proving Theorem |2] is to establish the 
following result: 


II 


Lemma 5. Let {a(t),t > 0} be the state process of policy P 
and {/3(f), t > 0} be the state process of policy Q. If L > d m i n 
and a(0) = /3(0), then for the chunk departure sample paths 
of policy P and policy Q described above, we have 


policy P must belong to some request associated to an a! m 
satisfying a' m < a'-, the first inequality is due to ( 1241 . and 
the second inequality is due to the fact that no more than b 
chunks are downloaded under policy Q during (t, t + At], ■ 


<^2/3i(t) + L - d min (23) 

i=j i=j 

for all t > 0 and j = 1,2,... 

In order to prove this result, we first establish the following 
lemmas: 

Lemma 6. Suppose that, under policy P, the system state at 
time t is a and at time t + At is a'. Further, suppose that, 
under policy Q, the system state at time t is P and at time 
t + At is p\ If (i) L > d m in, (ii) no arrivals occur during the 
inten’al (f,f + At] and (Hi) 


Lemma 7. Suppose that, under policy P, 3! is obtained by 
adding a request with b remaining chunks to the system whose 
state is a. Further, suppose that, under policy Q, P' is obtained 
by adding a request with b remaining chunks to the system 
whose state is (3. If 

OO OO 

Y Oti < ^2 Pi + L - d min, V j = 1, 2, . . . , (26) 

i=j i=j 

then 

OO OO 

+ L - dnun, Vj=l,2,... (27) 

i=j i=j 


Y°i<Y Pi + L - d min , V j = 1,2,... (24) 

i=j i=j 

Then, for the chunk departure sample paths of policy P and 
policy Q described above, we have 

OO OO 

Ya'i <YPi + L ~ dmin, V j = 1,2,... (25) 

i=j i=j 

Proof: If YiLj a i A L—d m i n , then ( l25l ) follows naturally. 

if ES [j a[ > L — d m in + 1, the unfinished requests have 
at least L — d nnn 4- 1 remaining chunks to download at time 
t 4- At. Equation ([j} tells us that each unfinished request i 
has rii — ki = di — 1 redundant chunks. Therefore, the system 
must have at least a total number of YYj a i + d m i n — 1 > L 
available chunks at time t + At, and all L threads are active 
under policy P at time t + At. 

Next, since there is no request arrivals during the interval 
(t, t + At], all L threads must be kept active during (f, t + At] 
under policy P. Suppose that b chunks are downloaded under 
policy P during (f, t + At]. Then, in the two chunk departure 
sample paths constructed above, no more than b chunks are 
downloaded under policy Q during (t,t 4- At], because the 
threads can be idle. 

Further, suppose that one chunk being served at time t + At 
under policy P is associated to an a' m satisfying a' m > a'. 
Then, according to the description of policy P (preemptive 
SERPT-R), all the available chunks of the requests with a' 
or fewer remaining chunks must be also under service at time 
t + At. We have just shown that the requests with a' or fewer 
remaining chunks have a total number of at least L available 
chunks. Thus, the total number of chunks under service at time 
t 4- At is no less than L 4-1, which is impossible. Therefore, 
any request under service at time t + At must associate to an 
a' m satisfying a' m < a'. Since no arrivals occur during the 
interval (f, t+ At], each downloaded chunk of policy P during 
(t, t + At] must belong to some request associated to an a! m 
satisfying a' m < a'-. 

Using these facts, we can obtain YZZj a i = YrLj 
YnLj Pi+ L- d min - b < P'i + L - d min, where the 

equality is due to the fact that each downloaded chunk of 


Proof: The proof is similar to Lemma 3 in lfl6ll . Without 
loss of generalization, we suppose that b is the Zth coordinate 
of a' and the mth coordinate of S'. We consider the following 
four cases: 

Case 1 : l < j,m < j. We can obtain YYj a i = 

Ei=j-1 a i — Ei=j-1 Pi+L — (Zinin = E»=j d m in- 

Case 2: l < j,m > j. We have a! i = YYLj -1 ^ 

b+Y',°Zj ai < 6+E”j Pi + L — (Zmin = Efcj Pi+L — dmin- 

Case 3: l > j,m < j. We have YYj a i = b + YYLj a i < 

YY=j -1 — Efcj-1 PiPdj — d m in = EEj Pi+ L — dmin- 

Case 4 : l>j,m> j. We have YYj a i = b + YLj a i < 
b + ^2i = j Pi + L — dmin = Ei=j Pi + dj — d m in- ■ 

Using the initial state a(0) = /3(0), Lemmas [6] and [7] 
it is straightforward to prove Lemma [3 After Lemma 0 is 
established, we are ready to prove Theorem [3 

Proof of Theorem \2\ As explained above, we only need 
to evaluate the delay gap between policy P and policy Q 
when L > d m i n - Let the evolution of the system state under 
some queueing discipline be on a space (f2, T. P). We assume 
that the request arrival process {a,i,ki,rii}fL x is fixed for all 
well. Let {a(t),t > 0} be the state process of policy P and 
{/3(f), t > 0} be the state process of policy Q. Then, we have 
5(0) = /3(0) = 0. 

Suppose that under policy Q, there are y request arrivals 
and z request departures during (0, f]. Then, there are y — z 
requests in the system at time t such that YiL y -z+i Pi(t) = 0. 
According to Lemma [3 we have J2Z y -z +1 “i W < L-d mi „. 
Hence, under policy P, the system still needs to download 
L — dmin or fewer chunks after time t, in order to complete 
z requests as in policy Q. Suppose that exactly L — d m i n 
chunks are needed to complete z requests. At time t, at least 
L — l threads are assigned to serve the requests associated to 
the L — dmin chunks that are most likely to result in request 
departures. After one of these chunks is downloaded, at least 
L — 2 threads are assigned to serve the requests associated to 
the L — dmin — 1 chunks that are most likely to result in request 
departures. This procedure goes on, until L — d m in chunks are 
downloaded. Because the chunk download time of each thread 
is i.i.d. exponentially distributed with mean 1/p, the average 
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time for downloading these L — d m iu chunks under policy P 
is upper bounded by 


Note that there exists an integer i (0 < i < l) such that 

oli - <5i > ... > on - Si > 0 > a i+ 1 - 6 i+ 1 > ■ ■ ■ > ai - Si. 


L -1 

E 

(—f^min 


1 

1Jt' 


(28) 


where N is the average time for downloading one chunk when 
l threads are active. If less than L — d UI \ n chunks are needed 
to complete 2 requests, the average downloading time will 
be even shorter. Hence, the delay gap between policy P and 
policy Q is no more than the term in ( 1281 . and (Q]i follows. ■ 


Appendix C 
Proof of Theorem[3] 

First, the optimal policy under the conditions of Theorem 0 
is feasible even if preemption is allowed. Hence, by Theorem 
[H preemptive SERPT-R provides a lower bound of D opl , i.e., 
the optimal delay of the policies satisfying the conditions 
of Theorem Q] On the other hand, non-preemptive SEDPT- 
R provides an upper bound of D op t . The remaining task is 
to evaluate the delay gap between preemptive SERPT-R and 
non-preemptive SEDPT-R. 

For notational simplicity, we use policy P to denote pre¬ 
emptive SERPT-R, and policy NP to denote non-preemptive 
SEDPT-R. We will show that for any time t and any given 
sample path of chunk departures (f i, t 2 , ■ ■ ■), policy NP needs 
to download L or fewer additional chunks after time t, so as 
to accomplish the same number of requests that are completed 
under policy P during (0, t]. 

Definition 8. |fl6l The system state of preemptive SERPT-R 
(policy P) is specified by an infinite vector /? = (/3i, /?2 , ■ • ■) 
with non-negative, non-increasing components. At any time, 
the coordinates of /? are interpreted as follows: /3i is the 
maximum number of remaining chunks among all requests, 
/?2 is the next greatest number of remaining chunks among 
all requests, and so on, with duplications being explicitly 
repeated. Suppose that there are l unfinished requests in the 
system, then 

A > A > • • • > A > 0 = A+i = A+2 = • • ■ ■ (29) 

Definition 9. The system state of non-preemptive SEDPT-R 
(policy NP) is specified by a pair of vectors {a, <5}, where 
a = (ai, £* 2 , • • ■) and S = ( 61 , 62 , ■■ ■) are two infinite vectors 
with non-negative components. At any time, the coordinates of 
a and <5 are interpreted as follows: a, is the number of chunks 
to be downloaded for completing the request associated to the 
zth coordinate, and <5,; is the number of threads assigned to 
serve the request associated to the ith coordinate such that 
6, < L. Suppose that there are l unfinished requests in 
the system, then the coordinates of a and <5 are sorted such 
that 


ai — > <22 — 62 P ■ ■ ■ P cq — 61, 


ai+i — 61+ 1 = ai + 2 — 6i + 2 = ... = 0 , 


OLi 


Si 


>0, if i <1; 

= 0, if i > l + 1, 
> 0, if i < l; 

= 0, if i > l + 1. 


(30) 

(31) 

(32) 

(33) 


The key step for proving Theorem [3] is to establish the 
following result: 

Lemma 8. Let {a(t), 6(t),t > 0} be the state process of 
policy NP and {/3(f), t > 0} be the state process of policy 
P. If a(0) = <5(0) = /3(0) = 0, then for any given sample 
path of chunk departures (fi,f 2 , ■ • ■)> we have 

OO OO 

y>( t )-Mt)i<E ft(t) (34) 

*=J i=j 

for all t > 0 and j = 1,2,... 


In order to prove this result, we first establish the following 
lemmas: 


Lemma 9. Suppose that, under policy NP, {a' , 5'} is ob¬ 
tained by completing a chunk at one of the L threads in the 
system whose state is {a, 5}. Further, suppose that, under 
policy P, ft' is obtained by completing a chunk at one of 
the L threads in the system whose state is /3. If 

OO OO 

<E&> Vj = l,2,..., (35) 

i=j i=j 

then 

OO OO 

EK^]<E^Vj=1,2,... (36) 

i=j i=j 


Proof: Suppose that, under policy NP, there are l un¬ 
finished requests at state {a, (5}. If Y^LjWi ~ <^] < 0, then 
the inequality ( l36t follows naturally. In the following, we will 
consider the scenario of \ a 'i ~ <^] > 0 in two cases. 

Case 1: Under policy NP, the chunk departure does not 
lead to a request completion. In this case, the thread that has 
just completed a chunk will be reassigned to serve the request 
associated to the Zth coordinate such that a[ — 6 [ = ai — 6 i~l. 
Meanwhile, we have o! i — 6 \ = csti— 6 i for all i = 1,2,..., I— 1, 
and a'— 6 ( = 0 for all i = 1+ 1, 1+2,... Since > 

0, we have j < l. Therefore, \ a 'i ~ = Yilj [“'■ ~ “ 

1 < YZj A - 1 < YZj Pi- 

Case 2: Under policy NP, the chunk departure results 
in a request departure. Suppose that the departed request is 
associated to the mth coordinate at state {a, (5} (m < l ). After 
the request departure, the threads that was previous serving the 
request associated to the mth coordinate will be reassigned to 
serve the request associated to the l — 1th coordinate at state 
{d',6'}. 


If j > m, then we have " 

A] - 1 < YZLj+i A - 1 < A - 

If j < m, then we have Y°Z a 'i 
E ZjS'i = E“A Hence, YZjWi 

A] -1 < YZj A -1 < YZj A- 


-<5'] < E,= i+1 N- 
1 < YZjP'i- 
= Yilj a i - 1 and 

- <5'] = E£i[«i - 

■ 


Lemma 10. Suppose that, under policy NP, {a', 5'} is 
obtained by adding a request with b remaining chunks to the 
system whose state is {a, 5}. Further, suppose that, under 
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policy P, fi 1 is obtained by adding a request with b remaining 
chunks to the system whose state is /?. If 

OO OO 

Vj = 1,2,..., (37) 

i=j i=j 

then 

OO OO 

I>; - $ < V J = 1 > 2 >--- (38) 

i=j i=j 

Proof: The proof is similar to Lemma 3 in II161 . Without 
loss of generalization, we suppose that b is the Zth coordinate 
of {a', 5'} and the mth coordinate of f. We consider the 
following four cases: 

Case 1 : l < j,m < j. We can obtain YZZj[ a 'i ~ ft] = 

YZj-ilai ~ ft] < TZj-1 ft = TZ 3 Pi- 

case 2: l < j, m > j. We have YZZj\ a 'i ~ ft] = 

ZZi-M - ft] < b + EZilm -5 l ]<b + ■ ft = 

ZZj Pi- 

case 3: l > j,m < j. We have J2Zji a i — ft] = b + 

E,~>< - ft] < - ft] < ft = ft- 

Case 4: l > j,m > j. We have £ i=j .[a- — 5-] = b + 

EZj [<*i - ft] < b + EZj ft = £,~ , ft- ■ 

Using Lemma [9] Lemma [TO] and the initial state a(0) = 
<5*(0) = /3(0) = 0 at time t = 0, Lemma [8] follows 
immediately. After Lemma [8] is established, we are ready to 
prove Theorem [3] 

Proof of Theorem [7} As explained above, we only need 
to evaluate the delay gap between policy NP and policy P. 
Let the evolution of the system state under some queueing 
discipline be on a space (f2, J 7 , P ). We assume that the request 
arrival process {aj, ft, riftEi is fixed f° r all w € O. Let 
{a(t),S(t),t > 0} be the state process of policy NP and 
{/3(f), t > 0} be the state process of policy P. Then, we have 
a(0) = (0(0) = <5(0) = 0. 

Suppose that under policy P, there are y request arrivals 
and z request departures during (0, f]. Then, there are only 
y — z unfinished requests in the system at time f such that 
'}2,Z y -z+i ft (ft = 0. According to Lemma [8] we have 

YZZy-z+ l«i(*) < YZZy-z+ lft(0- Henc ^ under P° lic y 
NP, the system still needs to download ft(i) or 

fewer chunks associated to a y - z +i(t), a y - z + 2 (t),... after 
time t, in order to complete z requests as in policy P. 

Suppose that exactly "YZZy-z+i ft (ft chunks are needed to 
complete z requests. At time t, there are YZtZi ft (ft threads 
that are assigned to other requests. In order to accomplish z 
requests, the system still needs to download YZZ y ~z+ ift(ft 
chunks associated to a y - z +i(t), a y - z + 2 (t), ■ ■ ■, during 
which time at most ffZi ft (ft chunks associated to 
ai(ft,a 2 (ft, ■ • -,a y - z (t) will be downloaded. This is be¬ 
cause each thread that is serving a request associated to 
at (ft, a 2 (ft, ■ • -,a y - z (t) at time f will be reassigned to 
serve a request associated to a y - z +i(t), a y _ z + 2 (f),... after 
completing the current chunk. Since £^. 1 <J»(f) < L, the 
system needs to download at most L extra chunks to complete 
z requests, regardless of how many of these extra chunks 
belong to each request. Because the chunk download time of 
each thread is i.i.d. exponentially distributed with mean 1/p, 


the average time for the system to use L threads to download L 
chunks is 1/p. If less than Y^Zy-z+i ft (ft chunks are needed 
to complete z requests, the average downloading time will be 
even shorter. Hence, the delay gap between policy NP and 
policy P is no more than 1/p, and (0) follows. ■ 

Appendix D 
Proof of Theorem^] 

The delay lower bound of D 0 pt is trivial. For the upper 
bound of £>opt, we need to combine the proof techniques of 
Theorem 0 and Theorem 0 to qualify the delay gap between 
preemptive SERPT-R and non-preemptive SEDPT-R under the 
conditions of Theorem 0 By this, we can show that the delay 
gap is upper bounded by the average time for downloading 
L extra chunks due to non-preemption and L — (I IMln extra 
chunks due to low storage redundancy. Note that we only 
need to evaluate the extra delay caused by non-preemption 
during the time intervals when all L threads are active. This 
is because when the number of active threads is less than L, 
all the available chunks of the unfinished requests are under 
service at the same time, and thus non-preemption causes no 
additional delay beside the extra delay caused by low storage 
redundancy. By this. Theorem 0 follows. 

Appendix E 
Proof of Lemma0 

We first compare the chunk departure time instants among 
the class of work-conserving policies. 

Consider the departure time of the first chunk ft. Because 
ai = si = 0 and all L threads are active for t > 0, we have 

ft = min Xi (39) 

for non-preemptive SEDPT-NR, where Xi is the chunk down¬ 
loading time of thread l if it does not switch to serve another 
chunk before completing the current chunk. Under other work- 
conserving policies, some thread may switch to serve another 
chunk. If the thread has spent r seconds on one chunk, the tail 
probability for completing the current chunk under service is 
P(X > t + t\X > t). On the other hand, the tail probability 
for switching to serve a new chunk is P(X > t). Since the 
chunk downloading time is i.i.d. NLU, it is stochastically 
better to keep downloading the same chunk than switching 
to serve a new chunk. Therefore, ft under non-preemptive 
SEDPT-NR is stochastically smaller than that under any other 
work-conserving policy. 

Next, suppose that (ft,ft,... ,ft) under non-preemptive 
SEDPT-NR are stochastically smaller than those under any 
other work-conserving policy. Let Ri denote the remaining 
time for thread l to download the current chunk after ft. Under 
non-preemptive SEDPT-NR, since all L threads are active at 
all time t > 0, tj+i is determined as 

ft+i = min^ [ft + Ri]. (40) 

Under other work-conserving policies, some thread may 
switch to serve a new chunk before completing the current 
chunk. Similar as above, one can show that (ft, ft,...,ft+i) 
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under non-preemptive SEDPT-NR are stochastically smaller 
than those under any other work-conserving policy. By in¬ 
duction, the chunk departure instants (t\, t 2 , ■ ■ ■) under non- 
preemptive SEDPT-NR are stochastically smaller than those 
under any other work-conserving policy. 

Finally, since the downloading times of different chunks 
are i.i.d., service idling only postpones chunk departure time. 
Hence, the chunk departure time instants will be larger under 
non-work-conserving policies. Therefore, (ti,t 2 , ■ ■ ■) under 
non-preemptive SEDPT-NR are stochastically smaller than 
those under any other online policy. 

Appendix F 
Proof of Lemma[3] 

We first construct a delay lower bound of D 0pt . Consider a 
fixed sample path of the chunk departure instants (ti,t 2 , ■ ■ •)• 
The request departure instants (ci i7r , C 2 )7r , • • ■) are determined 
by the correspondence between the requests and the departed 
chunks. Define r,(i) as the number of remaining chunks to 
be downloaded after time t for completing request i. If each 
departed chunk belongs to an unfinished request i with the 
smallest rpt), the number of unfinished requests is minimized. 
By this, we obtain a lower bound on the sample-path average 
delay ~ a i)- According to Lemma [5] the chunk 

departure instants (t\,t 2 , ■ ■ •) under non-preemptive SEDPT- 
NR are stochastically smaller than those under any other pol¬ 
icy. By integrating jj — af) over the distribution of 

(fi^ f 2 ; ■ • ■) under non-preemptive SEDPT-NR, a delay lower 
bound of D 0pt is obtained. On the other hand, non-preemptive 
SEDPT-NR provides an upper bound of D op t- The remaining 
task is to evaluate the delay gap between the delay lower bound 
and non-preemptive SEDPT-NR. 

Next, we utilize the proof techniques of Theorem [3] to 
evaluate the delay gap between non-preemptive SEDPT-NR 
and the above lower bound. For notational simplicity, we use 
policy P to denote the above constructed policy that achieves a 
lower bound of D opt , and policy NP to denote non-preemptive 
SEDPT-NR. We will show that for any time t and any given 
sample path of chunk departures (ti,t 2 , ■ ■ ■), policy NP needs 
to download L or fewer additional chunks after time t, so as 
to accomplish the same number of requests that are completed 
under policy P during (0, t]. 

Definition 10. fl6l The system state of policy P is specified 
by an infinite vector P = (Pi, P 21 ■ • •) with non-negative, 
non-increasing components. At any time, the coordinates of 
P are interpreted as follows: p-\ is the maximum number of 
remaining chunks among all requests, @2 is the next greatest 
number of remaining chunks among all requests, and so on, 
with duplications being explicitly repeated. Suppose that there 
are l unfinished requests in the system, then 

Pi > @2 > • • • > Pi > 0 = Pi+i = Pi + 2 =- (41) 

Definition 11. The system state of non-preemptive SEDPT-NR 
(policy NP) is specified by a pair of vectors {a, <5}, where 
a = (ai, 0 . 2 , ■ ■ ■) and 8 = ( 61 , 62 , ■ ■ •) are two infinite vectors 
with non-negative components. At any time, the coordinates of 
a and 6 are interpreted as follows: a t is the number of chunks 


to be downloaded for completing the request associated to the 
ith coordinate, and (5, is the number of threads assigned to 
serve the request associated to the ith coordinate such that 
5i < L. Suppose that there are l unfinished requests in 
the system, then there exists an integer m (0 < m < l) such 
that the coordinates of a and 6 satisfy 


G( ? ; 


6i 


81 > 

• ■ * dm 8 m >0 — a m -)_i <5 m _|_i — . 

..(42) 

>0, 

if i <l\ 


= 0, 

if * > l + 1, 


> 0, 

if i < l: 

(43) 

= 0, 

if * > l + 1. 


Lemma 11. Let {a(t),8(t),t > 0} be the state process of 
policy NP and {P(t),t > 0} be the state process of policy 
P. If a( 0) = (5(0) = /3(0) = 0, then for any given sample 
path of chunk departures (t\,t 2 , ■ ■ ■), we have 


^K(t) - 6i(t)] < 5>(i) 

*=j *=J 

for all t > 0 and j = 1,2,... 


(44) 


Lemma [TT] can be obtained from the following lemmas: 

Lemma 12. Suppose that, under policy NP, { S '. S '} is 
obtained by completing a chunk at one of the L threads in 
the system whose state is {a, 5}. Further, suppose that, under 
policy P, P' is obtained by completing a chunk at one of the 
L threads in the system whose state is p. If 


-(5j] < y lPi, v j = 1,2,..., 


(45) 


then 


E 

i=j 






v j = 1,2, 


(46) 


1=2 


Proof: If ~ ^'] = 0, then the inequality 

follows naturally. 

if TZM-W > 0, suppose that, under policy NP, there 
are m requests satisfying ai — Si > 0 at state {cf, 5}. After 
the chunk departure, the thread that just became idle will be 
assigned to serve a request associated to the smallest positive 
ai — Si. This tells us that (i) a( — <5' = a, — (5, for i = 
1,2,... ,m-l; (ii) a' m -8' m = a m -<5 m -l; and (iii) a' i -8' i = 
ai 6i — 0 for i — m T1, m T 2,... Since YlT=-j \ a 'i ~ 
we have j < m. Hence, YnLj \ a 'i - K] = YnLj i a i - <5i] -1 < 


Lemma 13. Suppose that, under policy NP, { S '. S '} is 
obtained by adding a request with b remaining chunks to the 
system whose state is {a, 5}. Further, suppose that, under 
policy P, P' is obtained by adding a request with b remaining 
chunks to the system whose state is p. If 


Et ai - &] < E^’ v j = 1,2 ,..., 


(47) 




1=3 
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then 

oo oo 

V i = 1 > 2 ---- (48) 

i=j i=j 

The proof of Lemma Qj] is the same with that of Lemma 
[TOl Now, we are ready to prove Lemma [3] 

Proof of Lemma Q} As explained above, we only need 
to evaluate the delay gap between policy NP and policy P. 
Let the evolution of the system state under some queueing 
discipline be on a space (f l, T, P). We assume that the request 
arrival process {ai,ki,rii\0 =1 is fixed for all u> £ fi. Let 
{a(t), 6(t),t > 0} be the state process of policy NP and 
{/9(f), t > 0} be the state process of policy P. Then, we have 
a(0) = 0(0) = 6(0) = 0. 

Suppose that under policy NP, there are y request arrivals 
and 3 request departures during (0,f]. Then, there are y z 
requests in the system at time t such that Y0iL y - z +1 Pi(t) = 
0. According to Lemma fill we have Y^rLy-z+i a i(t) < 
< L. Hence, under policy NP, the system 
still needs to download L or fewer chunks associated to 
a y - z +i(t), a y - z + 2 (t), ■ ■ ■ after time f, in order to complete 
z requests as in policy P. Further, Y0rL y ~ z +i $i(t) — L tells 
us that the services of these chunks have already started by 
time t. Therefore, the average remaining downloading time of 
these chunks after time t is no more than 

max Xi 1 . (49) 

l = l,...,L J 

Therefore, the delay gap between policy NP and policy P is 
no more than E (max; = i Xi}, and Lemma [3] is proven. ■ 

Appendix G 
Proof of Theorem[5] 

We will prove this theorem in three steps: in Step 1, we 
will construct a virtual policy which provides delay lower 
bound of D opt ; in Step 2, we will compare the chunk depar¬ 
ture sample paths of the constructed virtual policy and non- 
preemptive SEDPT-NR; in Step 3, we will evaluate the delay 
gap between the delay lower bound and the average delay of 
non-preemptive SEDPT-NR. The details are provided in the 
sequel. 

Step 1: We first construct a virtual policy which provides 
delay lower bound of D opt . Define r(t) as the total number 
of remaining chunks to be downloaded for completing all the 
unfinished requests at time t. We construct a virtual policy P 
as follows: If r(t) > L at time t, each thread is assigned to 
serve one chunk and will not switch to serve another chunk 
until it has completed the current chunk. If 0 < r(t) < L, 
suppose that there are L — r(t) “virtual” chunks, such that each 
thread is assigned to serve one chunk and will not switch to 
serve another chunk until it has completed the current chunk. 
If r(t) = 0, all L threads are idle. Further, under the virtual 
policy P, each departed chunk belongs to an unfinished request 
with the fewest remaining chunks. Similar to Lemma [2] we 
can obtain the following result: 

Lemma 14. If the chunk downloading time is i.i.d. NLU, 
then for given request parameters N and (ai,ki,rii)f =l , the 


constructed chunk departure instants (ti,t 2 ,...) of policy P 
are stochastically smaller than those under any online policy. 

Proof: We first compare the chunk departure times among 
the class of work-conserving policies. 

Let us consider the departure time of the first chunk t\. 
Because a\ = Si = 0 and all L threads are active for t > 0, 
we have 

t\ = min X[ (50) 

for the constructed chunk departures, where Xi is the chunk 
downloading time of thread l if it does not switch to serve 
another chunk before completing the current chunk. Under 
other work-conserving policies, some thread may switch to 
serve another chunk. We have shown that, if the chunk 
downloading time is i.i.d. NLU, it is stochastically better to 
keep downloading the same chunk than switching to serve 
a new chunk. Therefore, t\ under policy P is stochastically 
smaller than that under any work-conserving policy. 

Next, suppose that the constructed chunk departure instants 
(fi, < 2 ,..., tj) of policy P are stochastically smaller than 
those under any work-conserving policy. Let Ri denote the 
remaining downloading time of thread l for serving the current 
chunk after time max{sj+i, tj}. Under policy P, all L threads 
are active after time max{.Sj + i, tj }. Hence, tj+ 1 is determined 
as 

tj+ 1 = ^ min^ [max{sj- + i,fj} + Ri]. (51) 

Under other work-conserving policies, some thread may 
switch to serve a new chunk before completing the current 
chunk. Similar with the above discussions, one can show that 
the chunk departure instants (t\, t 2 , ■ ■ ■, tj+i) of policy P are 
stochastically smaller than those under any work-conserving 
policy. By induction, the constructed chunk departure instants 
(fi, f 2 , ■ • •) of policy P are stochastically smaller than those 
under any work-conserving policy. 

Finally, since the downloading times of different chunks 
are i.i.d., service idling only postpones chunk departure time. 
Hence, the chunk departure times will be larger under non- 
work-conserving policies. Therefore, the constructed chunk 
departure instants (ti,t 2 ,...) of policy P are stochastically 
smaller than those under any online policy. ■ 

Under policy P, each departed chunk belongs to an un¬ 
finished request with the fewest remaining chunks, such that 
the number of unfinished requests is minimized. Accord¬ 
ing to Lemma [14] the constructed chunk departure instants 
(f 1; t 2 , • ■ •) of policy P are stochastically smaller than those 
under any online policy. By taking the expectation over the 
distribution of (ti, t 2 ,...), one can show that the virtual policy 
P provides a delay lower bound of D opt . On the other hand, 
non-preemptive SEDPT-NR provides an upper bound of D apt . 
The remaining task is to evaluate the delay gap between policy 
P and non-preemptive SEDPT-NR. 

Step 2: We now study the chunk departure sample paths of 
policy P and non-preemptive SEDPT-NR. For notational sim¬ 
plicity, we use policy NP to denote non-preemptive SEDPT- 
NR. Similar to the proof of Lemma [3] we define the system 
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states of policy P and policy NP. Let {c?(f), 5(f), f > 0} be 
the state process of policy NP and {/3(f) ,t> 0 } be the state 
process of policy P. Suppose that d?(0) = <5 (0) = /3(0) = 0. 

Lemma 15. If c 1(0) = 5(0) = /3(0) = 0, then for any chunk 
departure sample path of policy NP, there exists a chunk 
departure sample path of policy P, such that for any time 
t the number of chunks downloaded during ( 0 , f] under the 
sample path of policy P is no more than L — 1 plus the number 
of chunks downloaded during ( 0 ,f] under the sample path of 
policy NP, i.e., 

OO OO 

E«<«<E A(t) +L- 1, V f > 0. (52) 

2=1 2 = 1 

Proof: We partition the system service duration of policy 
NP into a sequence of time intervals (ti, v{\, ( 17 , T 2 ], (t 2 , vf\, 
(v 2 , 73 ], ..such that r(t) < L — 1 for f £ (Ti, 27 ] and 
r(f) > L for t £ (z 7 ,Tj+ 1 ] for i = 1,2,... Therefore, 
under policy NP, at most L - 1 threads are active during 

the intervals (77 14 ] and all L threads are active during the 
intervals ( 17 , t,; + i]. We construct a “virtual” policy Q based 
on policy NP: After time 77 there are at most L — 1 remaining 
chunks to be downloaded. Under policy Q, these remaining 
chunks are completed immediately after time t, such that the 
L threads are idle during ( 7717 ]. During (i/j,Tj+i], policy Q 
is defined according to the same principle of policy P: “virtual 
chunks” are used when there are less than L remaining chunks 
such that all L threads are active under policy Q until there is 
no remaining chunk to download. The system state of policy 
Q is specified by an infinite vector 7 = ( 71 , 72 ,...) with 
non-negative, non-increasing components. At any time, the 
coordinates of 7 are interpreted as follows: 71 is the maximum 
number of remaining chunks among all requests, 72 is the next 
greatest number of remaining chunks among all requests, and 
so on, with duplications being explicitly repeated. 

Next, we prove that 

OO OO 

E»*w<E 72 (f) +L-1 (53) 

2—1 2=1 

for all f > 0. During ( 7717 ], we have YZZi a i( b ) = r ( b ) — 
L — 1 and 7 j(f) = 0. Hence, ( l53l > holds during (rj, 17 ]. 

At time 17 , policy NP has at most L 1 extra chunks, 
compared to policy Q. Further, the L threads of policy NP 
start downloading earlier than time 27 , while the L threads 
of policy Q start downloading exactly at time 77 . Hence, 
Therefore, (l53l > must hold during ( 27 , Tj+i]. By induction, ( I53[ i 
holds for all f > 0 . 

Further, we show that there exists a chunk departure sample 
path of policy P such that 

OO OO 

V °- (54) 

2=1 2=1 

During (77 Z 7 ], policy Q satisfies E£i7»(f) = 0 and <l54l > 
follows. During (z 7 , r i+1 ], policy Q satisfies the same principle 
as policy P, except for their different initial states at time 27 . 
In particular, policy Q has no chunk to download before time 
Vi and policy P may have some chunks not completed yet 


before time 77 . Therefore, policy P needs to complete these 
remaining chunks to have the same state with policy Q. Since 
policy P and policy Q satisfy the same principle, there must 
exist a chunk departure sample path of policy P such that (l54l > 
holds during ( 77 , 7 + 1 ]. By induction, (154b holds for all f > 0. 
Combining (l53l > and (l54l >. Lemma fl5l follows. ■ 

Step 3: We will show that for any time t and the chunk 
departure sample paths constructed above, policy NP needs 
to download 2 L — 1 or fewer additional chunks after time 
t, so as to accomplish the same number of requests that are 
completed under policy P during (0, f]. Towards this goal, we 
need to prove the following lemma: 

Lemma 16. Let {d?(f), 5(f), f > 0} be the state process of 
policy NP and {/3(f), f > 0} be the state process of policy 
P. Ifa( 0) = 5(0) = /3(0) = 0, then under the chunk departure 
sample paths of policy NP and policy P mentioned above, 
we have 

OO OO OOOO 

^A(f) + ^[ai(f) - 5,(f)] < Y^Pi(t) +5Zc*i(f) ( 55 ) 
2=1 2=7 2=7 2=1 

for all t > 0 and j = 1,2 ,... 

Lemma [16] can be easily obtained from the following two 
lemmas: 

Lemma 17. Suppose that, under policy NP, the system state 
at time t is {cf, 5} and at time t + Af is {cE', 5 , |. Further, 
suppose that, under policy P, the system state at time t is /3 
and at time t + At is (3'. If (i) no arrivals occur during the 
interval (f,f + Af] and (ii) 

OO OO OO OO 

pi + Y^{ a * - <*i] - YlP* + 5Z ai ’ V 3 = 1 > 2 > • • -X56) 
2=1 2=7 2=7 2=1 

then 

OO OO OO OO 

^^ + ^K-5']<^/3( + ^a', Vj = 1,2,... (57) 

2=1 i=j i=j 2=1 

Proof: If YlhvLj [ a ’i — = then the inequality (l36l ) 

follows naturally. 

if EZM - W suppose that b chunks are down¬ 

loaded under policy NP during (f, f + Af], and d chunks are 
downloaded under policy P. Then, we have 

OO OO 

'52 a i~'^2 a i = b i ( 58 ) 

2=1 2=1 

OO OO 

= ^ (59) 

2=1 2 = 1 

Further, under policy NP, the smallest and yet positive a, — Si 
will decrease by one after each chunk departure. Hence, we 
have 

OO OO 

= (60) 

2=7 i=j 

Using ([56|. ([58]>-([60]», we obtain 1 ~ b 'i\ = 

£ 2=1 P'i + E Zs K -Si]-b = EZi Pi + E2=; [«i - Si] + 
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TZi < 
EZi A 


- ESi «* < ££i# + ££i 


«' + ££,-&• - 
< + TZ j ^ ■ 


Lemma 18. Suppose that, under policy NP, { S'. S '} is 
obtained by adding a request with b remaining chunks to the 
system whose state is {a, d}. Further, suppose that, under 
policy P, /3' is obtained by adding a request with b remaining 
chunks to the system whose state is f}. If 


OO OO OO OO 

+ ~ Si] <Y^pi + ^2oti, V j = 1,2,..., (61) 

i= 1 i=j i—j i—1 

then 

OO OO OO OO 

£/?' + ;£[a'-£']<£/?' + ]£ a', V j = 1,2,... (62) 

i—1 i—j i=j i—1 

The proof of LemmafTSlis quite similar with that of Lemma 
[TOland is thus omitted. We now prove Theorem [5] 

Proof of Theorem [3} As explained above, we only need 
to evaluate the delay gap between policy NP and policy P. 
Let the evolution of the system state under some queueing 
discipline be on a space (12, T ’, P). We assume that the request 
arrival process {a*, fcj, is fixed for all w £ 12. Let 

{a(t),6(t),t > 0} be the state process of policy NP and 
{/9(f), t > 0} be the state process of policy P. Then, we have 
a(0) = 0(0) = 6(0) = 0. 

Suppose that under policy NP, there are y request arrivals 
and 3 request departures during (0,f]. Then, there are y z 
requests in the system at time t such that Y^Zy-z+i Pi(t) = 0 . 
According to Lemma [l 6 l and ( l52l >. we have £ i=iy _ 2+1 a,(f) < 
£tj,_* + i5i(f) + £“i[o!i(f)-A(*)] < 2L-1. Hence, under 
policy NP, the system still needs to download 2 L 1 chunks 
after time t, in order to complete z requests as in policy P. 
Therefore, the average downloading time of these extra chunks 
after time t, is no more than 

Hextra < E < max Xi \ + E < max Xi i . (63) 
(;=i,...,l J [z=i,...,z,-i J 

Hence, the delay gap between policy NP and policy P is no 
more than E (max/ = i j ... j L Xi} + E {maxj = i £_i Xi}. By 
this, Theorem 0 is proven. ■ 


Appendix H 
Proof of Theorem[6] 

When preemption is allowed, the proof of Theorem 0 can 
be directly used to show that O still holds, with Z2 opt 
representing the optimal delay performance in the preemptive 
case. Further, preemptive SEDPT-WCR can achieve a shorter 
average delay than non-preemptive SEDPT-NR when preemp¬ 
tion is allowed. Then, Theorem [6] follows. 


Appendix I 
Proof of Lemma[4] 

We first compare the chunk departure time sequence among 
the class of work-conserving policies. Since d m ; n > L, all L 
threads are kept active whenever there are unfinished requests. 


Let us consider the departure time of the first chunk t\. 
Since a,i = si = 0, for any non-preemptive work-conserving 
policy, we have 

fr = min X;. (64) 

Therefore, the distribution of t± is invariant under any non- 
preemptive work-conserving policy. 

Next, suppose that (fi,f 2 ,. • • ,fj) under non-preemptive 
SEDPT-R are stochastically smaller than those under any other 
work-conserving policy. Let r; denote the time that thread l 
has spent on the current chunk up to time tj, and Ri denote 
the remaining time for thread l to download the current chunk 
after time tj. The tail distribution of If is given by 

P (Ri > y \ti = r) = P(X > 7 + t\X > r). (65) 

By (l65l ) and the condition that the chunk downloading time 
distribution is NSU, the remaining downloading time Ri of the 
case Ti = 0 is stochastically smaller than that of the case t; = 
r > 0. In other words, the remaining downloading time Ri is 
stochastically smaller if thread l switches to download a new 
chunk at time tj. For any non-preemptive work-conserving 
policy, tj+i is determined as 

tj+i = min [maxjsj+i, tj} + Ri]. ( 66 ) 

Hence, (fi,f 2 , • • ■, fj+i) is stochastically smaller if all L 
threads switch to download a new chunk at time tj. This 
only occurs under SEDPT-R, where all L threads are assigned 
to serve the same request. Therefore, (fi,f 2 ,...,L/+i) under 
non-preemptive SEDPT-R are stochastically smaller than those 
under any other work-conserving policy. 

By induction, (f 1; f 2 ,..., fjv) under non-preemptive 
SEDPT-R are stochastically smaller than those under any 
other work-conserving policy. 

Finally, since the downloading times of different chunks 
are i.i.d., service idling only postpones chunk departure time. 
Hence, the chunk departure times will be larger under non- 
work-conserving policies. Therefore, (fi, • • •, fzv) under 
non-preemptive SEDPT-R are stochastically smaller than those 
under any other online policy. 

Appendix J 
Proof of Theorem[7| 

Since ki = 1 for all i, each file only has one remaining 
chunk. Hence, the file departure process (ci )7r , c 2 , 7 r,..., cjv.tt) 
is a permutation of (t\, i 2 ,..., i/v) and 

N N 

j2^{u} = J2 E {c^}- ( 67 > 

i—1 i—1 

In Lemma [4] it was shown that the chunk departure in¬ 
stants (fi, f 2 ,..., fjv) under non-preemptive SEDPT-R are 
stochastically smaller than those under any other online policy. 
Therefore, non-preemptive SEDPT-R minimizes £ £, E {t,} 
061 . By this. Theorem [7] is proven. 


